Regular Expressions with Numerical Occurrence Indicators - preliminary results
نویسندگان
چکیده
Regular expressions with numerical occurrence indicators (#REs) are used in established text manipulation tools like Perl and Unix egrep, and in the recent W3C XML Schema Definition Language. Numerical occurrence indicators do not increase the expressive power of regular expressions, but they do increase the succinctness of expressions by an exponential factor. Therefore methods based on straightforward translation of #REs into corresponding standard regular expressions are computationally infeasible in the general case. We report some preliminary results about computational problems related to efficient matching and comparison of #REs. Matching, or membership testing of languages described by #REs, is shown to be tractable. Simple comparison problems (inclusion and overlap) of #REs are shown to be NP-hard. We also consider simple #REs consisting of a single symbol and nested numerical occurrence indicators only, and derive a simple numerical test for the membership of a word in the language described by a simple #RE.
منابع مشابه
One-unambiguity of regular expressions with numeric occurrence indicators
Regular expressions with numeric occurrence indicators are an extension of traditional regular expressions, which let the required minimum and the allowed maximum number of iterations of subexpressions be described with numeric parameters. We consider the problem of testing whether a given regular expression E with numeric occurrence indicators is 1-unambiguous or not. This condition means, inf...
متن کاملOptimizing Schema Languages for XML: Numerical Constraints and Interleaving
The presence of a schema offers many advantages in processing, translating, querying, and storage of XML data. Basic decision problems like equivalence, inclusion, and non-emptiness of intersection of schemas form the basic building blocks for schema optimization and integration, and algorithms for static analysis of transformations. It is thereby paramount to establish the exact complexity of ...
متن کاملInclusion of Unambiguous RE#s is NP-Hard
We show that testing inclusion between languages represented by regular expressions with numerical occurrence indicators (#REs) is NP-hard, even if the expressions satisfy the requirement of “unambiguity”, which is required for XML Schema content model expressions. 1 Proof of the result We have seen before [3] that testing for inclusion and overlap of languages represented by #REs is NP-hard. T...
متن کاملThe Membership Problem for Regular Expressions with Unordered Concatenation and Numerical Constraints
We study the membership problem for regular expressions extended with operators for unordered concatenation and numerical constraints. The unordered concatenation of a set of regular expressions denotes all sequences consisting of exactly one word denoted by each of the expressions. Numerical constraints are an extension of regular expressions used in many applications, e.g. text search (e.g., ...
متن کاملShorter Regular Expressions from Finite-State Automata
We consider the use of state elimination to construct shorter regular expressions from finite-state automata. Although state elimination is an intuitive method for computing regular expressions from finitestate automata, the resulting regular expressions are often very long and complicated. We examine the minimization of finite-state automata to obtain shorter expressions first. Then, we introd...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003